Cost-sensitive regression learning on small dataset through intra-cluster product favoured feature selection
نویسندگان
چکیده
Massive regression and forecasting tasks are generally cost-sensitive learning problems with asymmetric costs between over-prediction under-prediction. However, existing classic methods, such as clustering feature selection, subject to difficulties in dealing small datasets. As one of the key challenges, it is difficult statistically validate importance features using traditional algorithms (e.g. Boruta algorithm) owing insufficient available data. By leveraging information intra-cluster (item group similar attributes), we propose an product favoured (ICPF) selection algorithm select based on filtering method (specifically our study). The experimental results show that ICPF significantly reduces number dimensions selected set improves performance learning. misprediction cost decreased by 33.5% (linear-linear function) 32.4% (quadratic-quadratic after adopting algorithm. In addition, advantage robust other models, random forest XGboost.
منابع مشابه
Cost-sensitive Dynamic Feature Selection
We present an instance-specific test-time dynamic feature selection algorithm. Our algorithm sequentially chooses features given previously selected features and their values. It stops the selection process to make a prediction according to a user-specified accuracy-cost trade-off. We cast the sequential decision-making problem as a Markov Decision Process and apply imitation learning technique...
متن کاملCluster-Dependent Feature Selection through a Weighted Learning Paradigm
This paper addresses the problem of selecting a subset of the most relevant features from a dataset through a weighted learning paradigm. We propose two automated feature selection algorithms for unlabeled data. In contrast to supervised learning, the problem of automated feature selection and feature weighting in the context of unsupervised learning is challenging, because label information is...
متن کاملCost-Sensitive Feature Selection for On-Body Sensor Localization
Activity recognition systems have demonstrated potential in a broad range of applications. A crucial aspect of creating large scale human activity sensing corpus is to develop algorithms that perform activity recognition in a way that users are not limited to wear sensors on predefined locations on the body. Therefore, effective on-body sensor localization algorithms are needed to detect the lo...
متن کاملEvaluation of Logistic Regression Model with Feature Selection Methods on Medical Dataset
gression enable us to investigate the relationship between a categorical outcome and a set of explanatory variables. The outcome or response can be either dichotomous (yes, no) or ordinal (low, medium, high). During dichotomous response, we are performing standard logistic regression and for ordinal response, model that uses standard logistic regression formula with feature selection using forw...
متن کاملCost-Sensitive Spam Detection Using Parameters Optimization and Feature Selection
E-mail spam is no more garbage but risk since it recently includes virus attachments and spyware agents which make the recipients’ system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning techniques have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Connection science
سال: 2021
ISSN: ['0954-0091', '1360-0494']
DOI: https://doi.org/10.1080/09540091.2021.1970719